Multilingual Corpus-based Approach to the Resolution of English -ing

نویسندگان

  • Lee Schwartz
  • Takako Aikawa
چکیده

Corpus data has proven to be useful for dealing with ambiguities in NLP. A number of studies, for example, have deal with disambiguating English PP attachments, using corpus data (Hindle and Rooth (1993), Brill and Resnik (1994), Steina and Nagao (1997), Ratnaparkhi (1998), and Pantel and Lin (2000), among others). This paper explores a novel approach to resolving ambiguities associated with –ing + Noun constructions in English. We use an aligned multilingual (English, Spanish, French, German and Japanese) corpus to extract lexical information necessary for disambiguation. Our premise is that while in English –ing constructions are highly ambiguous, corresponding constructions in other languages may not be ambiguous, and can thus provide English with disambiguating information. We argue that with aligned multilingual corpora, languages can learn non-trivial linguistic information from one another. 1. Ambiguities in English –ing constructions Different syntactic and semantic relationships can exist in English between an -ing verb form and a following noun. At the syntactic level, an NLP system must decide whether the –ing + noun construction is a verb + object pair, or if it is a modifier + noun pair. So, for example, in (1a) using is a verb with the object passwords, whereas in (1b) testing is a modifier of purposes. (1a) Click to learn more about using passwords with your identity. (1b) For testing purposes, click Next. For the purpose of translation, it is often the case that we need to specify what type of modification relationship exists between an -ing form and a following noun in a noun phrase. In (1b) the relationship of testing to purposes might be considered one of adjunct to noun as in the paraphrase, purposes of testing. But in other constructions that are similar with respect to syntax, the noun following the -ing form may actually be better thought of as the subject of the ing verb. So, in (1c) the noun rows might be interpreted as the subject of matching, as in the paraphrase rows that match. (1c) It specifies that matching rows returned by the query match a list of words. Certainly, a similar paraphrase, i.e, purposes that test, is not possible for (1b). In this paper we explore the automatic extraction of information necessary to distinguish verb + object constructions (such as (1a)) from modifier + noun constructions (such as (1b) and (1c)). 2. -Ing constructions in other languages While in English, the -ing + Noun construction is often ambiguous, in other languages, various linguistic devices, often unambiguous in nature, are used to instantiate the different relationships between the parts of the construction. For example, the NP licensing information in (2a), in which licensing is a modifier of the noun information (i.e., ‘information for licensing’), is likely to be expressed as a compound noun in languages such as Japanese or German as shown in (2b) and (2c). In languages such as French or Spanish, on the other hand, the same type of modifier + noun relationship is likely to be expressed as a noun + prepositional phrase construction (‘information about licensing’), as shown in (2d) and (2e). (2a) English: When the number of users is different from the number of computers, this may provide incorrect licensing information. (2b) Japanese: ユーザーの数がコンピュータの 数と異なる場合は、正しいライセンス情報が

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disambiguation of English PP Attachment using Multilingual Aligned Data

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguist...

متن کامل

Coreference Resolution in a Multilingual Information Extraction System

We present in this paper the coreference mechanism implemented in the M-LaSIE system, a prototype multilingual Information Extraction (IE) system. We describe an experiment in which texts from a parallel French/English corpus were marked up manually and processed by the system following the MUC coreference annotation scheme. This experiment allows us to assess the applicability of the MUC annot...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

A Corpus-based Analysis of Epistemic Stance Adverbs in Essays Written by Native English Speakers and Iranian EFL Learners

Academic essays entail taking a stance on the truth value of propositions. Epistemic adverbs deal with the speaker's assessment of the truth value of propositions. Employing a corpus-based approach with descriptive statistics and qualitative description, this study explored the use of epistemic stance adverbs in academic essays written by native English speakers and Iranian EFL learners. Follow...

متن کامل

Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus

In this article we illustrate and evaluate an approach to create high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The trans...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004